CONTINUITY OF MUTUAL ENTROPY IN THE LIMITING 
SIGNAL-TO-NOISE RATIO REGIMES 

Mark Kelbert 1 ' and Yuri Suhov 2 ) 

Abstract. This article addresses the issue of the proof of the entropy power in- 
equality (EPI), an important tool in the analysis of Gaussian channels of information 
transmission, proposed by Shannon. We analyse continuity properties of the mutual 
entropy of the input and output signals in an additive memoryless channel and discuss 
assumptions under which the entropy -power inequality holds true. 
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1. Introduction 

The aim of this paper is to analyse continuity properties of mutual and conditional 
entropies, between the input and output of a channel with additive noise. Our atten- 
tion is focused mainly on a distinctly non-Gaussian situation, for both large and small 
signal-to-noise ratio. To our knowledge, this nontrivial aspect has not been discussed 
before at the level of generality adopted in this paper. A complex character of the con- 
tinuity properties of various entropies was acknowledged as early as in the 1950's; see, 
e.g., paper JTJ where a number of important (and elegant) results have been proven, 
about limiting behaviour of various entropies. 

An additional motivation was provided by the short note [9| suggesting an elegant 
method of deriving the so-called entropy -power inequality (EPI). The way of reasoning 
in O is often referred to as the direct probabilistic method, as opposite to the so- 
called analytic method; see J6], 0, 0, (8). The results of this paper (Lemmas 2.1- 
2.4 and Lemma 3.1) provide additional insight on the assumptions under which the 
direct probabilistic method can be used to establish the EPI in a rigorous manner. For 
completeness, we give in Section 4 a short derivation of the EPI in which we follow 
Ref. [9 1 but clarify a couple of steps thanks to our continuity lemmas. However, 
without rigorously proven continuity properties of mutual and conditional entropies in 
both signal-to-noise ratio regimes, the derivation of the EPI via the direct probabilistic 
method cannot be accomplished. 

Another approach to EPI, for discrete random variables (RVs) where it takes a 
different form, is discussed in [4], see also references therein. For the history of the 
question, see Q; for reader's convenience, the statement of the EPI is given at the end 
of this section. 
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To introduce the entropy power inequality, consider two independent RVs X\ and 
X 2 taking values in M. d , with probability density functions (PDFs) fxi(x) and fx 2 (x), 
respectively, where iSK''. Let h(X(), i= 1,2, stand for the differential entropies 

h(Xt) = - I f Xi {x)\n f x ,{x)dx := -Eln f Xi (Xi), 

and assume that — °° < h{X\),h(X 2 ) < +°°. The EPI states that 

Jh{x 1+ x 2 ) > £ \h{x x ) + Jh(x 2 ) j ^ L1 j 



or, equivalently, 

h(X l +X 2 )>h(Y l +Y 2 ), (1.2) 

where Y\ and Y 2 are any independent normal RVs with h(Y{) = h(X\) and h(Y 2 ) — 
h(X 2 ). This inequality was first proposed by Shannon [7 1; as was mentioned earlier, it 
is used in the analysis of (memoryless) Gaussian channels of signal transmission. A 
rigorous proof of ( 1 . 1 ), ( 1 .2) remains a subject of a growing amount of literature; see, 
e.g., references cited above. In particular, the question under what conditions upon 
PDFs f Xj Eqns (1.1), (1.2) hold true remains largely open. 

It is not hard to check that Eqn (1.1) is violated for discrete random variables (a 
trivial case is where (1.1) is wrong is when X\, X 2 take one or two values). Neverthe- 
les, continuity properties of joint entropy remain true (although look slightly different) 
when one or both of RVs X\, X 2 have atoms in their distributions, i.e., admit values 
with positive probabilities. In our opinion, these properties can be useful in a number 
of situations. 

When variables X\ orX 2 have atoms, the corresponding differential entropies h(X\) 
and h(X 2 ) are replaced with 'general' entropies: 

h(Xi) = -EpXjWln px,(x) - / fxi(x)\n fx,(x)dx 

X J 

= -J g x ,(x)ln g x ,(x)m(dx) := -Eln g x ,{Xi). 
Here £ represents summation over a finite or countable set B(= 3(Xj)) of points 

X 

x e W 1 . Further, given a RV X, px{x) stands for the (positive) probability assigned: 
px{x) = ¥(X = x) > 0, with the total sum rj(X) :^'Lpx(x) < 1. Next, fx, as before, 

X 

denotes the PDF for values forming an absolutely continuous part of the distribution 
of X (with j fx(x)6x = 1 — t](X), so when f](X) = 1, the RV X has a discrete distri- 
bution, and h(X) = — Y<Px(x)]n px(x)). Further, m(= mx) is a reference measure (a 

linear combination of the counting measure on the discrete part and the Lebesgue mea- 
sure on the absolutely continuous part of the distribution of X) and gx is the respective 
Radon-Nikodym derivative: 

gx(x) = px(x)l(x 6 ID) +fx (x) , with / g x (x)m (dx) = 1 . 
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We will refer to gx as a probabiliy mass function (PMF) of RV X (with a slight abuse 
of traditional termonology). It is also possible to incorporate an (exotic) case where a 
RV Xj has a singular continuous component in its distribution, but we will not bother 
about this case in the present work. 

It is worth to note that the scheme of proving Eqn (1.1) for a discrete case fails in 
Lemma 4.1 (see below). 



2. Continuity of the mutual entropy 

Throughout the paper, all random variables are taking values in (i.e., are d- 
dimensional real random vectors). If Y is such an RV then the notation h(Y), fy(x), 
Py(x), gy(x) and m(dx) have the same meaning as in Section 1 (it will be clear from 
the local context which particular form of the entropy h(Y) we refer to). 

Similarly, fx,y(x,y) and, more generally, gx : y(x,y), x,y € K d , stand for the joint 
PDF and joint PMF of two RVs X, Y (relative to a suitable reference measure m(dx x 
dy)( = mx.y(dx x dy)) on R d x R d ). Correspondingly, h(X,Y) denotes the joint en- 
tropy of X and Y and I(X : Y) their mutual entropy: 

h(X,Y) = - j gx.y{x,y)\n g x ,y(x,y)m(dx x dy), I(X : Y) = h{X)+h(Y) — h(X,Y). 

We will use representations involving conditional entropies: 

I(X : Y) = h{X) - h(X\Y) = h{Y) - h(Y\X), 

where 

h{X\Y) =h{X,Y)-h(Y), h(Y\X) =h(X,Y)-h(X). 

In this section we deal with various entropy-continuity properties related to the 
so-called additive channel where a RV X (a signal) is transformed into the sum X + 
U, with RV U representing 'noise' in the channel. In fact, we will adopt a slightly 
more general scheme where X is compared with X^/y + U, y > being a parameter 
(called sometimes the signal-to-noise ratio), and study limits where y — > +°° or y — > 
0+. We will assume that RVs X and U are independent (though this assumption may 

be relaxed), and that the 'noise' U has a PDF fu(x) with J fu(x)dx = 1. However, 

the signal X may have a general distribution including a discrete and an absolutely 
continuous part. 

We begin with the analysis of behaviour of the mutual entropy I(X : X^/y+U) 
when y^>- +°°: this analysis will be used in Section 4, in the course of proving (1.1). 

We begin with the case where X has a PDF fx (x), with J fx (x)dx = 1 . Here and below, 

we use the (standard) notation (b) + = max [0,b] and (£>)_ = min [0,b], teR. 

Lemma 2.1. LetX, U be independent RVs with PDFs fx and fy where J fx(x)dx = 
J fu(x)dx=l. Suppose that (A) J fx (x) | In fx (x) | dx < +°° and that (B) for any e > 
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there exists a domain D £ C R'' x R d such that for all y> To (e) 

J clvclvl (f x (x) > Q)f x (x)fu (y) (in J f x (x ' y ' 



xM")\D E 



fu{v)dv 



and for (x,y) G D £ uniformlyin y> 70(e), the following inequality holds true: 



In 



fu(v)dv 



<Ve(x,y) 



< e 



(2.1) 



(2.2) 



where ^ (x,y) is a function not depending on y, with f dx f dyfx (x)fu iyY^e (x,y) < °°. 

Also assume that (C) PDF fx is piece-wise continuous (that is, fx is continuous 
on each of open, pair-wise disjoint domains Ci , . . . , Cn C R d with piece-wise smooth 
boundaries <9Ci, . . . ,3Cn, with dimension dim dCj < d, and fx =0 on the comple- 
mejitR rf \Ui< ; '<Ar(C/U<9C/)). Furthermore, let fx be bounded: sup [fx(x) : x € R] = 
b < +°°. Then 

h{X) = lim[l(X:X^y+U)+h(U/^y)]. (2.3) 

Proof of Lemma 2.1. Set: Y := Xy/y+ U. The problem is, obviously, equivalent to 
proving that 

[h(X\Y)-h(U/y/f)]^0. 

Writing h{U/y/y) = - In V7 -/ 

fu(u) In f v (u)du, we obtain 



h(X\Y)-h(U/Jy) 



= - J dxl (fx (x) > 0)f x (x) ffu(y- x^y) In 

+ ln^7+ J fu(u)\n fu(u)du 
= jdxl(f x {x)>Q)fx{x)jf u {y)\n 

= J dxl(f x (x) > 0)f x (x) J fu(y)ln 



fx{x)fu(y-x^y) 
Jdufx{u)f u (y-u y /y) 



dy 



y/y / dufx (u)f v (y + (x-u)s/y) 



fx(x) 
y-v 



dy 



fu(v)dv 



fx(x) 



Next, we decompose the last integral: 

/(y)=/+(y)+/-(y) 



dy:=/(y). 

(2.4) 

(2.5) 
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where 



7+(y) = J dxl(f x (x) > 0)f x (x) J fu(y) 



In 



fxix 



fu(v)dv 



and 



7_(y) = | dxl(/x(x) > 0)f x (x) J fu(y) 



In 



V 



AW 



/y(v)dv 



AW 



(2.6+) 
dy. 



(2.6-) 

The summand I + (7) is dealt with by using Lebesgue's dominated convergence the- 
orem. In fact, as 7 — > +°°, for almost all ije R'', 

" \/r/ 



i(AW>o) 



In 



V 



l/c/(v)dv 



AW 



(2.7) 



because (a) /x x + 



y-y 
" Vr 



• fx{x) Vi,y,v € R rf by continuity of /V, and (b) 



fu{v)dv — > Vx,y since /x is bounded. 



Next, we derive from (2.7) that I + (7) — > 0. Here we write 



/£/(v)dv-ln/ x (x) 



< |ln£| + |ln/x(x)| 



and again use the Lebesgue's theorem, in conjunction with the assumption that 

jf x (x)\lnf x (x)\dx<+™ 

The summand /_ (7) requires a different approach. Here we write/- (7) = /_(y,D £ )H 
7_(y,D e ), by restricting integration in dwfy to D £ and D £ = (R d x R rf ) \ D £ , respec- 
tively. The summand 7_(y,D £ ) — > by an argument similar to the above (i.e., with the 
help of the Lebesgue's theorem). For 7_(7,D e ) we have that limsup— 7_(y, D £ ) < e. 

Since e can be made arbitrarily close to 0, the statement of Lemma 2.1 follows. ■ 



In Section 3 we check conditions of Lemma 2. 1 in a number of important cases. 



Remark 2.1. An assertion of the type of Lemma 2.1 is crucial for deriving the EPI 
in Eqn (1.1) by a direct probabilistic method, and the fact that it was not provided in 
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|9| made the proof of the EPI given in [9| incomplete (the same is true of some other 
papers on this subject). 

In the discrete case where signal X takes finitely or countably many values, one has 
the following 



Lemma 2.2. LetX and U be independent RVs. Assume that X is non-constant, ad- 
mits discrete values xi, Xi, ■■■ with probabilities px(x\), px{xi), ■ ■ ■, and has h(X) = 
— Y,Px(xi)ln px(xi) < +°°- Next, assume that U has a bounded PDF fu(x) with 

Xj 

J fu(x)dx = 1 and sup \fu(x) ■ x G W 1 ] = a < +°°, and 

lim fu (x + (Xxq) = 0, V x,xo G R d with xq ^ 0. 

a— s-±oo 

Finally, suppose that J fu (x) | In fu (x) 1 6x < +°°. Then 



h(X) = KmI(X:Xy/y+U). 



(2.8) 



Proof of Lemma 2.2. Setting as before, Y = X^/y+ U, we again reduce our task to 

proving that h(X\Y) -t 0. 
Now write 

ur viv s r r \ [*r /t,m Px{xi)fu(y~x iy /y) 

h(x\Y) =-Zpx (xi) fuiy- xiy/y) in — — — r-?^ d y 



= Lpx(xd fu(y) 



i>l 



x In 



1+ I Pxix^pxixiT^fuiy-ixj-x^fviy)- 1 



dy. 

(2.9) 

The expression under the logarithm clearly converges to 1 as 7 — > +°o, V i > 1 and 
y G R d . Thus, V i > 1 and y € W, the whole integrand 



fu(y)in 



!+ E Px( x j)px{xi) l fu(y-(xj-xt)y/?)fij(y) 



-1 



0. 



To guarantee the convergence of the integral we set qi — £ Px(xj)px(xi)~ l 
Px(xi)~ l — 1 and \jf(y) = In fuiy) and use the bound 



In 



1+ I px(xj)px(xi) l fu(y-(xj-Xi)y/f)f v (y) 



-1 



<ln (l+aqie-YW) 



< l(aqie-vW > l)ln (^e-vM) + l(a 9l -g-vM < l) In 2 
<21n 2 + ln fl + ln (# + 1) + |y(y)|. 
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We then again apply Lebesgue's dominated convergence theorem and deduce that 
lim h(X\Y)=0. U 



In the general case, the arguments developed lead to the following continuity prop- 
erty: 



Lemma 2.3. Let RVs X and U be independent. Assume a general case where X may 
have discrete and absolutely continuous parts on its distribution while U has a PDF 

fu with j fu(x)Ax = 1. Suppose the PDF fx, with j f x {x)Ax := 1 - Tj(X) < 1, is 

continuous, bounded and satisfies assumption (B ) from Lemma 2. 1 . Next, suppose that 
the PDF fu is bounded and 

lim fu(x+x a) — 0, Vx,XQER d , with x ^ 0. 
a— ^±oo 



Finally, assume that Jgx(u) | In gx{u) \mx(du) + J fu(u)\\n fu(u)\du 
h{X) = lim [/(X : X^y+ U) + [1 -r\{X)]h (u/^f) 



< +°°. Then 



The proof of the EPI (1.1) in Section 4 requires an analysis of the behaviour of 
I(X : XtJj+N) also when y ^ 0. Here we are able to cover a general case for RV X 
in a single assertion: 



Lemma 2.4. Let X, U be independent RVs. Assume that U has a bounded and con- 
tinuous PDF fu e C°(M. d ), with J fu(x)dx = 1 and sup \fu{x), x e R d ] = a < +°° 

whereas the distribution of X may have discrete and continuous parts. Next, assume, 
as in Lemma 2.3, that 



Jgx{u)\\n g x {u)\m x {Au) + Jf v (u)\ln fu(u)\du 



Then 



liml(X:Xy/y+U)=0. (2.10) 



Proof of Lemma 2.4. Setting again Y = X^/y+ U, we now reduce the task to proving 
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that h(X\Y)->h(X). Here we write 

h(X\Y) =-jgx(x)jf u (y-x^f) In 



= J gx(x) J fu (y)ln 



8x{x)fu(y-xy/y) 
jgx (u)fu (y-Uy/f) mx {du 
J gx(u)fu(y + (x-u)y/f)mx(du) 
gx(x)fu(y) 



■dymx(dx) 



dymx(dx). 



(2.11) 

Due to continuity of fu, the ratio under the logarithm converges to (gx(x)) 1 as 
y — > 0, V x,y € Mr. Hence, the integral in (2.1 1) converges to h(X) as y ^ 0. Again, 
the proof is completed with the help of the Lebesgue dominated convergence theorem. 



Remark 2.2. Lemma 2.4 is another example of a missing step in earlier direct proba- 
bilistic proofs of the EPI. 
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3. Uniform integrability 



As was said before, in this section we discuss several cases where assumptions of 
Lemma 2.1 can be checked. We begin with the case where PDF fx is lower-bounded 
by a multiple of the normal PDF. Let <pz (or, briefly, <j>) stand for the standard if-variate 
normal PDF with mean vector and adxd covariance matrix E, and we assume that 
E is strictly positive definite: 



<k(x) 



1 



■exp 



I(x,E-'x) 



(2^) d / 2 detE 1 /2 

Here and below, ( ■ , ■ ) stands for the Euclidean scalar product in 



(3.1) 



Proposition 3.1. Assume that fx(x) > a<fe(x), x e ~R d , where a e (0, 1], and 

J f x (x)\ln f x (x)\dx, J / x (x)(x,E- 1 x)dx, J fy(y) (y^y) dy < +oo. (3.2) 

Then assumptions (A) and (B) of Lemma 2.1 hold, and the bound in Eqn (2.1) is 
satisfied, with 70(e) = 1 , D f = R d x R d and 



V{x,y) 
where p = 



E- 1 jc) + (y,E- 1 y 



In J exp [— 2(v,E 1 v)]/j/(v)dv 



P (3-3) 



In 



a 



(2 ? r) rf / 2 detE 1 /2 

Proof of Proposition 3.1. Write: 
y-v\ 



J fx { x+l ^f) M v ) dv > a j<k (■ 



> « f 

- (2%) d l 2 detY. l l 2 J 



\ + y -^- \fu{v)dv 



exp 



/t/(v)dv 



> 



a 



(2^/ 2 detE"T72 eXP ^^^ lx ^ eXP 



1 



x / exp 



Hence, we obtain that Vije R'', 



In j f x + l^j fy^dv- In fx (x) 



1 

fu(v)dv. 



<(x,E- 1 x) + -(y,E- 1 y) 
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In / exp 



fu(v)dv 



+ \lnf x (x)\ 



In 



a 



{2n) d l 2 fetlM 2 



with the RHS of this bound decreasing with y. For y = 1, it gives the bound of Eqn 
(3.3). ■ 

It is not difficult to extend the argument from the proof of Proposition 3.1 to the 
case where a more general lower bounds holds: 

fx(x)>exp(-P(x)),xeR d 

where P(x) is a polynomial in x e R'', bounded from below. Of course, existence of 
finite polynomial moments should be assumed, for both PDFs fx and fjj. Moreover, 
the lower bounds for fx can be replaced by lower bounds for fuiy)\ in particular, this 
covers the case where fu{y) > <X<j)(y), y £ R rf , a e (0, 1], and Eqn (3.2) holds true. 



An 'opposite' case of substantial interest is where both PDFs fx and fy have com- 
pact supports. In this paper we do not address this case in full generality, leaving this 
for future work. However, we will discuss a couple of examples to show what mecha- 
nisms are behind convergence. For A,B e R d denote 



[[AM 

(tacitly assuming that A,- < B, Vz). 



x [Aj,Bj] 

\<]<d 



Proposition 3.2. Let PDF fx admit finitely many values. Further, assume that PDF 
fu has a compact support [[A,B]], and satisfies the lower bound 



fu(y) > 



a 



n pi-*) 

Ki<d 



l(Ai<y<Bi, i=l,...,d) 



where < a < 1 and A = (Ai,...,A d ), B = (Bi,...,B d ) e R d obey -00 < A; < B, < 
+00. Then assumption (B) of Lemma 2. 1 holds with function (x,y) = and domains 

D £ = x d[ !) where the sets are defined in (3. 7). Moreover, We) = Ce- 2 l d . 
\<i<d 



Proof of Proposition 3.2. Consider first a simplified scalar case where fx(x) = 
-l(a < x < b), —00 < a < b < °o while fu has support [A,B] and satisfies the 



b — a 
bound f v (y) > 



a 

B-A 



1(A < y < B), with < a < 1 and -00 < A < B < +°o. Take 
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y>4(B-A) 2 /(b-a) 2 . Then write 

/ :=-Jdxl(f x (x) > 0)f x (x) J dyfy (y) 



x ^ln 

b B 
a A 

■= no) 



j fx{x + y -^jfu(v)Av I f x {x) 



B/\{y+{x-a)Jy) 



fu(v)dv 



AV(y+(x-b)^y) 



(3.4) 



The decomposition 7 = 7(0) +7(1) in the RHS of (3.4) extracts an 'interior' term 7(0), 
which vanishes, and a 'boundary' term 7(1) which has to be estimated. More precisely, 
7(0) = 7_(0) +7 (0) +7+(0) where 



a+(B-A)I^Y 



7_(0) 



b / - / 



(b-a) 



dyfuiy) In 



B-(x-a)Jy 
b-(B-A)/Jy B 



J dvfu(v) 



k(0) 



(b 



i-y j dxjdyfuiy) In J dvf v (v) 



a+{B-A)IJy A 



0, (3.5) 



0, (3.6a) 



and 



/+(0) 



A+(b-x)^/y 



l — J dx J dyfu(y) 



(b-a) 



In 



b-(B-A)/^y 



dvfu(v) 



0. (3.6b) 



Correspondingly, set D £ in the case under consideration is the union of three sets 
D £ = D £ U D £i0 U D £i+ where 

D £ = |(x,y) : a <x < a + (7i-A)Ve, B- (x-a)/\/e <y <s|, 
D £:0 = (a+(B-A)Ve,b-(B-A)Ve^J x (A,B) 



and 



D £i+ = j(jc,y) : b-(B-A)Ve <x<b,A<y<A + (b-x)/Vey (3.7) 
Observe that so far we have not used the upper bound for PDF fy. See Figure 1. 
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Further, 7(1 ) = 7_ (1 ) + 7+ (1 ) where 




We have to upper-bound integrals 7_(1) and 7 + (l). For definiteness, we focus on 
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/-(I); the changes for/ + (l) are automatic. We have that 

a+{B-A)/y/y B-(x-a)Jy 

/-(I) <- / £ / dy^Wln \^ x {y Hx - a)y ry-A) 

a A 
(B-A)/V7 B-A-x^Y 

dx 



/ 



b — a 



t-V? 



dy/£/(y+A)ln 



a 



B—A 



(y+xVf) 



»=% dx dyf u (y(B-A)+A)ln[a(y+x^f)] 



l l— a 



= -f5f^=/dx| dy/ £/ (y(B-A)+A)rn [a(y + x)]. 
V o o 

(3.10) 

It remains to check that the integral in the RHS of (3.10) is finite. 

But the integrand in the RHS of (3.10) has singularity at x = y = only, which is 
integrable. A similar calculation applies to This yields that, in the simplified 

case under consideration, the integral / in Eqn (3.4) obeys / < C/^/f. In the multidi- 
mensional situation, if we assume that X ~U([[A,B]]), then a similar argument gives 
that / < Cy~ d l 2 . The extension of this argument to the general scalar case where fx 
takes a finite number of values is straightforward. In the multidimensional case, when 
X ~ u ~U([0, l] d ), a similar argument gives that / = Cy~ d l 2 , finally a similar bounds 
holds when X takes a finite number of values. ■ 



Proposition 3.3. Let both fx and fy have a pyramidal form 

fx(x) = U(i-\xi\) + , fu(y)=U-(i--\y i -b i \) ,x,yeR d , (3.11) 
1=1 i=i °' V a ' / + 

where ai,...,aa > and — °° <bi<°°,i=\,...,d. Then assumption (B) of Lemma 
2.1 holds, with 

D e =([[-l + e,-e]]u[[-l + e,-e]])x[[fo-a,fo+a]], (3.12) 

where a = {a\, . . .,a d ), (b\,.. .,bd) and £ = (e, . . . , e) <G R d and e G (0, 1 /2). Further, 
7o(e) = 1/ (4e 2 ) and function } i' E (x,y) is given by 



Ve(x,y,r)=U 

!=1 



In 



l-2e 



a;+fc;+;y;(sgnx ( ) 



1 - \Xi\ 



(3.13) 



Proof of Proposition 3.3. First, consider a scalar case (where both PDFs have a 
triangular form) and assume, w.l.o.g., that b = 0. It is convenient to denote the rectangle 

(— 1,1) x (—a, a) in the (x,y)-plane by 8%. For (x,y) G set: 

/(=/(*,?)) = J dvfu(v)f x {x + y ^^j=^y f d U fx( U )fu((x-u)^r+y). (3.14) 
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Then, for y/y > a, on the parallelogram 

&+{ = &+{Y)) - {{x,y)eM: a-x^f<y< y/f{l-x)-a}, 
we have that 



J = 



Vf 



x+ 



J {l-u)(l-^{y/y(x-u)+y)y u 

+ j (i-«)(}(\/r(*-")+)0-i 



du 



x+ 



By the direct computation 



J = 1 -x -(a+y). 

Vf 

So, for y/f > a, in parallelogram & + , we have 

J(x,y) = \-x- ^—(y + a) and In = In 



(3.15) 



Geometrically, parallelogram ^ + corresponds to the case where the support of the 
scaled PDF 

x^x/Yfu((x-u)y/Y+y) (3.16) 

lies entirely in (0, 1), the right-hand half of the support for fx- Cf. Figure 2 below. 
Similarly, on the symmetric parallelogram 



■{ = &-) = {(x,y)€&: -{l+x) x /f+a<y<-Vfx-a}, 



we have 



J(x,y) = l-x--^=(a-y) and In = In 

Vf fx(x) 



1 



a-y 



vfiy +x) 



(3.17) 



This parallelogram corresponds with case where the support of the PDF in (3.16) 
lies in (-1,0). On the union & = U 



H(x,y, Y ):=ln J -^l=]n 



a+y (sgnx) 
" 1*1) 



(3.18) 
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If we set H(x,y; 7) — on \ 2? then function H(x,y, 7) converges to pointwise 
on the whole of 2%. Given e e (0, 1 /2), take 7 > l/(4e 2 ). On the set D £ , we have that 

\H(x,y, Y )\ < |//(x,y,l/(4e 2 ))| :=V e (x,y). 

The complement M \ 2? is partitioned in six domains (a right triangle with a vertex 
at point (— 1 , —a) plus an adjacent trapezoid on the left, a right triangle with a vertex at 
point (I, a) plus an adjacent trapezoid on the right, and two adjacent parallelograms in 
the middle). These domains correspond to various positions of the 'centre' and the end- 
points of support of PDF x i->- \2yfu(( x ~ u )\/Y+y) relative to (—1, 1) (excluding the 
cases covered by set 2?). On each of these domains, function J(x,y) is a polynomial, 
of degree < 3. Viz., on the RHS triangle {(x,y) e M : ^/y{\ -x)< y}, 



J(x,y) 



<T f y a (l-u){l(VY(x-u)+y)-l)du. 

+ 7f 



Next, in the RHS trapezoid {(x,y) £ 2% : -x)-a<y< ^/y{\ -x)} 



J{x,y) 



a 

+ 



f 

f 



^(l- M )(l-I(^/7(x- M )+y))d M 



rx+-^- 

r + s/7 



(I" 



x — u) +y) — 1 )du 
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Finally, on the RHS parallelogram {(x,y) € Sfc : —x^fy < y < fl — 




+ ^ V+x/7 (l-«)(l-^(y7(^-«)+3'))d« 

/•(.v+^Al) n \ -I 

+ / / (l- M )(-(y7(*- M )+y)-l)d« . 
Similar formulas take place for LHS counterparts. The integrals 

/ J fx(x)fu(y)\ln [J(x,y)/f x (x)]\dxdy (3.19) 

over each of these domains are assessed by inspection and decay as 0(1/ ^ff). 

In addition, to cover the complement M\T) e , we have to consider the set 8? \ D £ 
and integrate the function J(x,y) from Eqns (3.15) and (3.17). This again done by 
inspection; the corresponding integral is assessed as 0(e). Hence, the integral (3.19) 
over the entire complement ffl \ D £ is < Ce, for an appropriately chosen constant C > 0. 

The above argument can be easily extended to the cZ-dimensional case since we will 
be dealing with products of integrals. ■ 



4. The entropy-power inequality 

In this section we show how to deduce the EPI (1.1) from the lemmas established 
in Section 2. We begin with a convenient representation of the mutual entropy I(X : 
X^/y+U) in the case where U is a li-variate normal RV, with a short (and elementary) 
proof. We do not consider this proof as new: it follows the same line as [ 3 1 but is more 
elementary. 

Lemma 4.1. Let X and N be two independent RV, where N ~ N(0,E) while X has a 
PDFfx- Suppose that J fx(x)\\x\\ 2 dx < +°°. Given y > 0, write the mutual entropy 
between X and X^/y+N: 

I{X:X^r+N) =-jf x {x)i>{u-x^y)\n [f x (x)$(u - x^r)]dudx 
+ j fx{x)\n f x {x)Ax 
+ / fxs/y+N(u)ln fx^y+N{u)&U 

where 

fx^y+N(u) = fx(x)<j>(u-Xy/Y)dx. (4.1a) 
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Then 



where 



dy 



I(X:X^y+N)+h(N/^y) = -M(X;y)- 



2y 



(4.1b) 



M(X;y)=E \\X -E{X\X^/y+N)\ 



and the norm \\x\\ 2 = (x,E 



Proof of Lemma 4.1. Differentiate expression for I(X : X^/y+N) given in (4.1a), and 
observe that the derivative of the joint entropy h(X ^X^/y+N) vanishes, as h(X,Xy/y+ 
N) does not change with y > 0: 

h(X,X^y+N) 

= - j ' fx (x) <$>{u- x^/y) [ In f x (x) + In (« - x^/y)] dxdu 
= h(X)+h(N). 

The derivative of the marginal entropy h(X^/y+N) requires some calculations: 
—h{Xy/y+N) = -— J f X jy+N(u)ln f XVf+N (u)du 

= 1^7= Jfx(y)H»- Vry)((»- Vfy)^- l y)d y 



2Vf 

xln J f x (z)((>(u-y/yz)dzdu 

+J^=Jfx(y)H»-Vfy)dy 

I fx{w)${u - y/Yw)((u - y/yw),T.~ l w) dw 

— 7 

' fx(z)^(u-^/yz)dz 



(4.2) 



du. 



The second summand vanishes, as (i) the integrals 

j "f x (y)<j)(u-^/yy)dy and J f x (z)<j>(u- ^/yz)dz 
cancel each other and (ii) the remaining integration can be taken first in du, which yields 



17 



for V w. The first integral we integrate by parts. This leads to the representation 
d 



dy 



I(X:X^y+N) 



1 



2Vr 



fx(y)M"-Vyy) 



Jfx(x)<j)(u-^)((u-- s /yx),I. 1 y)dx 



fx(z)<l>(u-y/Yz)dz 

TfJ I dydufx (y^( u ~Vfy) 



dydu 



(4.3) 



2Vf 



jf x (x)$( U -^x)[(( U -^ry,L- 1 y) + ^y((y-x),-L- l y) 



fx(z)<j)(u-^Yz)dz 



The integral arising from the summand ((« — y/fy,^ l y) vanishes, because the mean 
vector in PDF is zero. The remaining contributions, from (y,l. y) — y), is 

equal to 



2 



\\X-E(X\X^y+N)\\ 
On the other hand, the first term in RHS of (4.3) equals 



fx{y)(t>(u-^/ry)I.ydy 



fx(z)(p(u-V7z)dz 



E 



fx {x) <p(u— \Jyx) dxdu 



\X-E(X\X^/y+N)\\ 2 = M{X;y) 



We are now going to derive the EPI (1.1) following the line of argument proposed 
in (9) and based on Lemma 4. 1 . First, suppose that X is a RV with a PDF fx where 

J fx{x)dx = 1. Then we assume that fx{x) satisfies the assumptions stated in Lemma 

2.1 and Lemma 2.4 and use these lemmas with U = N ^N(0,E). Consequently, V 

e >0, 



h{X) = \xmJl{X-.X^Y+N)+h(N/^/t) 

= j j- [K x '■ x Vr+ N ) + h ( N /Vr)] dr+i(x ■ x^/e+n) +h(N/^fe) 



1 r+° 



M(X;y) - -\{y > 1) dy+h(N) +l(X : X^/e+N) 
Y J ' 



(4.4) 



Here we use the identity j (1/ y)dy= — lne. By Lemma 2.4 the last term in (4.4) tends 
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to as £ -> 0. Hence, for an RV X with PDF f x € C° we obtain 



h(X)=h(N)- 



M(X;y)-1(y> 1) 



dy. 



(4.5) 



Remark 4.1. A straightforward calculation shows that Eqn (4.5) can be written in an 
equivalent form 



h{X)= l -\n (2%eal)- l - k 



-M{X;y) 



Ay 



(4.6) 



used in Eqn (4) from [9 |. We thank the referee who pointed at this fact in his comments. 



The proof of EPI is based on Eqn (4.5) and the following result from 0. 



Lemma 4.2. (Q, Theorem 6) Let 3£ be a given class of probability distributions on 
M. d , closed under convex linear combinations and convolutions. The inequality 

h (Xi cos 9+X 2 sin0) > /i(Xi)cos 2 9 + h(X 2 ) sin 2 9, (4.7) 

for any 9 6 [0,27t] and any pair of independent RVs X\, Xi with distributions from 
S£ , holds true iff the entropy power inequality is valid for any pair of RVs X\ , X 2 with 
distributions from St '. 



Theorem 4.1. Let U be d—variate normal N(0,I). Assume that RVs X\, X 2 take 
values in Mr and have continuous and bounded PDFs fx { (x), fx 2 (x), x £ M. d satisfying 
condition (A)-(B) of Lemma 2.1. Assume that the differential entropies h(X\) and 
h{X 2 ) satisfy -*> < h(X 1 ),h(X 2 ) < +«. Then the EPI (see Eqns (1.1)-(1.2)) holds 
true. 



Proof of Theorem 4. 1 . The proof follows Ref. [9 ] and is provided here only for com- 
pliteness of presentation. The result of Theorem 4. 1 may also be established for piece- 
wise continuous PDFs fx i (x) and fx 2 (x) as well (cf. Lemma 3.2). According to Lemma 
4.2, it suffices to check bound (4.7) V 9 G (0,2n) and Vpair of RVsXi,X 2 with contin- 
uous and bounded PDFs fx t (x),i= 1 , 2. Take any such pair and let be N(0, 1) where 
I is the d x d unit matrix. Following the argument developed in [9 1, we apply formula 
(4.5) for the RV X = X x cos <j) + X 2 sin <j) : 



1 



/z(Xicos</> +X 2 sin</>) =h(N) + - \ M(X\ cos(j> +X 2 sm^>;y) - l(y> 1)- 



dy. 



2 Jo 

To verify Eqn (4.7) we need to check that 

M(Xi cos <j> +X 2 sin 0; y) > cos <p 2 M(Xi; y) + sin <p 2 M(X 2 ;y). (4.8) 
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To this end, we take two independent RVs N\,N 2 ~N(0,I) and set 

Zi=X ly /Y+Ni,Z2=X 2 y/Y+N2, and Z = Z x cos</> + Z 2 sin</>. 
Then inequality (4.7) holds true because 



E 



X-E,(X\Z)\\ 2 >E \\X-E(X\Z h Z 2 



2 



||Xi-E(Xi|Zi)|| 2 cos</> 2 + E ||X 2 -E(X 2 |Z 2 )|| 



sin0 2 . 
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